In [196]:
%%capture
%run "5 - Statistics.ipynb"
%run "8 - Gradient Descent.ipynb"
import matplotlib.pyplot as plt
import random
%matplotlib inline

The Model

Let us define a simple prediction function that takes two constants, $\alpha$ and $\beta$:


In [197]:
def predict(alpha, beta, x_i):
    return beta * x_i + alpha

What should we use for alpha and beta? Suppose we know the desired output y_i, we can calculate the error for our inputs:


In [198]:
def error(alpha, beta, x_i, y_i):
    return y_i - predict(alpha, beta, x_i)

Now we can calculate the errors across the entire data set:


In [199]:
def sum_of_squared_errors(alpha, beta, x, y):
    return sum(error(alpha, beta, x_i, y_i)**2 for x_i, y_i in zip(x, y))

Now we just need to find the inputs that minimize the sum of squared errors:


In [200]:
def least_squares_fit(x, y):
    beta = correlation(x, y) * standard_deviation(y) / standard_deviation(x)
    alpha = mean(y) - beta * mean(x)
    return alpha, beta

In [201]:
alpha, beta = least_squares_fit(num_friends_clean, daily_minutes_clean)
alpha, beta


Out[201]:
(22.94755241346903, 0.903865945605865)

In [202]:
predict(alpha, beta, 20)


Out[202]:
41.02487132558633

In [203]:
plt.title('Simple Linear Regression Model');
plt.ylabel('minutes per day');
plt.xlabel('# of friends')
plt.scatter(num_friends_clean, daily_minutes_clean);
plt.plot(range(0, 50), [predict(alpha, beta, x) for x in range(0, 50)], color='green');


Our model is pretty good for how simple it is! We can measure how well a model does using the coefficient of determination (aka. R-squared). This measures the fraction of the total amount of variation in the dependent variable that is predicted by the model.


In [204]:
def total_sum_of_squares(y):
    """the total squared variation of y_i's from their mean"""
    return sum(v ** 2 for v in de_mean(y))

def r_squared(alpha, beta, x, y):
    """the fraction of variation in y captured by the model, which equals
    1 - the fraction of variation in y not captured by the model"""
    return 1.0 - (sum_of_squared_errors(alpha, beta, x, y) / total_sum_of_squares(y))

r_squared(alpha, beta, num_friends_clean, daily_minutes_clean) # 0.329


Out[204]:
0.3291078377836305

Higher R-squared scores represent a better fitting model. 1 is the highest that an R-squared score can go.

Using Gradient Descent

We can also make predictions using the Gradient Descent algorithm:


In [205]:
def squared_error(x_i, y_i, theta):
    alpha, beta = theta
    return error(alpha, beta, x_i, y_i)**2

def squared_error_gradient(x_i, y_i, theta):
    alpha, beta = theta
    return [-2 * error(alpha, beta, x_i, y_i), # alpha partial derivative
            -2 * error(alpha, beta, x_i, y_i) * x_i] # beta partial derivative

# choose random value to start
random.seed(0)
theta = [random.random(), random.random()]
alpha, beta = minimize_stochastic(squared_error, squared_error_gradient, num_friends_clean, daily_minutes_clean, theta, 0.0001)
alpha, beta


Out[205]:
(22.93746417548679, 0.9043371597664965)

In [ ]: